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, Abstract. Minimization problems in for Tikhonov functionals with sparsity 

\ constraints are considered. Sparsity of the solution is ensured by a weighted 

penalty term. The necessary and sufhcient condition for optimality is shown to be 
slantly differentiable (Newton differentiable), hence a semismooth Newton method 
is applicable. Local superlinear convergence of this method is proved. Numerical 
examples are provided which show that our method compares favorably with existing 
approaches. 
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g : 1 Introduction 

O ■ In this work we consider the optimization problem 

^1 Minimize -\\Ku — fW"^ + '^^Wk\uk\ over m G (1) 

Here, K : ^ Ti. is a hnear and injective operator mapping the sequence space i"^ into 
a Hilbert space H, f eH and w = {wk} is a sequence satisfying Wk > Wq > 0. 

One well understood algorithm for the solution of (1) is the so-called iterated soft- 
thresholding for which convergence has been proven in [10], see also [2,9]. While the 
iterated soft-thresholding is very easy to implement it converges very slow in practice (in 
fact the method converges linearly but with a constant very close to one [2]). Another 
well analyzed method is the iterated hard-thresholding which converges like (9(n~^/^) [3] 
(i.e. even slower than the iterated soft-thresholding but practically it is faster in many 
cases) . 

In this article we derive an algorithm for which we prove local superlinear 
convergence in the infinite dimensional setting. Our algorithm is an active set, or 
semismooth Newton, method and hence, the analysis is based on the notion of slant 
differentiability [8, 16]. The semismooth Newton method is easily implementable as an 
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active set method. Numerical experiments show that the method is robust with respect 
to the choice of the initial value and that it compares favorably with existing approaches 
in terms of computation time. 

The background for problems of type (1) is, for example, the attempt to solve 
the linear operator equation Ku = / in an infinite-dimensional Hilbert space which 
models the connection between some quantity of interest u and some measurements /. 
Often, the measurements / contain noise which makes the direct inversion ill-posed and 
practically impossible. Thus, instead of considering the linear equation, a regularized 
problem is posed for which the solution is stable with respect to noise. A common 
approach is to regularize by minimizing a Tikhonov functional [10,13,21]. A special 
class of these regular izat ions has been of recent interest, namely of the type (1). These 
problems model the fact that the quantity of interest u is composed of a few elements, 
i.e. it is sparse in some given, countable basis. To make this precise, let A : Tii 7^2 be 
a bounded operator between two Hilbert spaces and let {ipk} be an orthonormal basis 
of Hi. Denote hy B : ^ Hi the synthesis operator B{uk) = J2k''^k'4>k- Then the 
problem 



The sequence plays the role of the regularization parameter where each coefficient 
is regularized individually. However, for an analysis of the regularizing properties one 
might use awk instead and investigate a — > 0. We refer to e.g. [10,18,20] for analysis 
of the regularizing properties and parameter choice rules. 



Recently sparsity constraints have also appeared in the context of optimal control 
of PDEs [24]. 



The article is organized as follows. In Section 2 we derive a semismooth formulation 
for the minimization problem (1). Section 3 states the algorithm and local superlinear 
convergence is proven. The Section 4 presents numerical results on the regularization 
of the ill-posed problems of inverse integration and deblurring and shows an application 
to minimization in the context of compressed sensing. 

Notation. For 1 < p < oo, P denotes the space of p-summable sequences with norm 

ll'^llp — ( YlT=i I'^fel^ ) ' whereas £°° denotes the space of bounded sequences with norm 
\\u\\oo = maxfcgj^ \ uk\. Recall that these spaces satisfy £^ ^ £^ for 1 < p < g < oo and 
that \uk\ < \\u\\p holds for any u G i^. In the case p = 2 we simply write jl-ujl, and (■, ■) 
denotes the inner product in With Bp{u) we denote the open ball of radius p with 
respect to the norm of centered at u. The operator K* : H ^ £^ is the Hilbert space 
adjoint of K and L{X, Y) is the space of bounded linear operators from X to Y. 




k=l 



can be rephrased as 
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2 Optimality Conditions 

In this section we are going to derive the necessary and sufficient optimahty condition 
for the problem (1). It is going to be the basis for the semismooth Newton algorithm. 
This condition can be derived and expressed in different ways, for example by using the 
classical Lagrange duality, or by using subgradient calculus. 

Let us first address the conditions obtained by subgradient calculus. To this end 
we introduce the so-called soft-thresholding function. 

Definition 2.1. Let w = {wk} with Wk > wq > and 1 < p < oo, I < q < oo. The 
soft-thresholding of u with the sequence w is defined as the mapping : P' ^ & given 
by 

Sw{u)k = Syji^{uk) = max{0, \uk\ - Wk}sgn{uk). (2) 

Renicirk 2.2. Since elements of & are sequences converging to zero, the range of is 
^° = {li e : lifc = for almost every k}^&. 

With the help of the soft-thresholding operator, we can formulate the optimality 
condition in a compact way. 

Proposition 2.3. If K : i"^ ^ 7i is injective, the functional 

^ oo 

= -ll/TM-ZII^ + ^WfelMfel (3) 

k=l 

has a unique minimizer u G i"^. This minimizer is characterized by 

u = S^yj {u - -fK* {Ku- f)) for any > 0. (4) 

Proof. Since K is injective, ^' is strictly convex and coercive and hence, hen it has a 
unique minimizer. This minimizer is characterized by 

e 9^(12) 

which is equivalent to 

-K*{Ku- f) edFiu) (5) 

where F{u) = '^i^Wk\uk\. Multiplying with 7 > 0, adding u to both sides and inverting 
(/ -I- 'ydF) gives 

u= {I + ^ dF)-\u ~ -fK*{Ku - f)). 

(Note that (/ + 'j dF)~^ exists and is single-valued since the subgradient dF is 
maximal monotone if F is convex and lower semicontinuous [26, Proposition 32.17, 
Corollary 32.30].) A straightforward calculation shows that 

□ 
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Prom the characterization (4) and Remark 2.2 we can derive the following corollary. 

Corollary 2.4. The minimizer u of (3) is a finitely supported sequence. 

For convenience we also derive the optimality condition by Lagrange duality. We 
split the functional ^ according to 

■^{u) = G{Ku) + F{u) 

with G{h) — \\h — fW^/^ and F{u) — X^fe '"^fcl'^fel- To state the dual problem, we use 
the dual variable p which shall not be confused with exponents for £p spaces. The dual 
variable appears only in this section. The dual problem of (1) is defined as 

Maximize - F*{K*p) - G*{-p) over p e 7i. (6) 

This can be expressed as (see the appendix) 

Maximize — + (p, /) over p 

subject to |i^*p|fe < Wk for ^ k. 

The extremality conditions [12, Ch. III. 4] are: 

F{u) + F*{K*p) - {K*p, u) = (7a) 
G{Ku) + G*{-p) + {p, Ku) = 0. (7b) 

The first condition (7a) yields 

oo 

y^Wfc \uk\ -{K*p,u) = and \{K*p)k\<Wk 



k-l >Q 



Wfc, if Mfc > 
-Wk, if Mfc < 



^ Uk^O or {K*p)k = Wk sign Uk = 

and \{K*p)k\ < Wk- 
This condition can be written as the complementarity system 

K*p - w < 0, > 0, \K*p - w] = 



(8) 



-K*p - w < 0, M" > 0, [K*p + w]u- = 0, 
in a coordinatewise sense, which is in turn equivalent to 

u — max{0, + 7 {K*p — w)} + min{0, m + 7 {K*p + w)} (9) 

for any 7 > 0. 

The second condition (7b) yields 

\\\Ku - ffn +\\\- p\\l + /) + (p, Ku) = ^\\Ku -f + pII^ = 
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and thus 

Ku- f +p^0. (10) 
By plugging (10) into (9) we end up with 

u - max{0, M - 7 {K*{Ku - f) + w)} - min{0, m - 7 {K*{Ku - /) - w)} = 0, (11) 

which is just another way to express (4). 

Remark 2.5. The usual characterization e 9^('u) of the unique minimizer u of (1) 
is diffucult to handle for numerical algorithms because it is a nonsmooth inclusion. One 
attempt to tackle the problem is by interior point regularization as proposed in [17]. 
This, however, introduces additional nonlinearities into the problem. By contrast, our 
algorithm is based on the necessary and sufficient condition (11). As we shall prove in 
the following section, (11) is a semismooth equation in i'^, so that Newton's method can 
be applied. 

3 Semismooth Newton Method 

The previous section has shown that we can solve the minimization problem (1) by 
solving the equation (4) or (11), or briefly 

J^{u) = u-S^^{u- -iK*{Ku - /)) = 0, (12) 

for some 7 > 0. 

This is an operator equation in the space involving the non-differentiable max 
and min operations. Optimality conditions of this form frequently also occur in the 
context of optimal control problems for partial differential equations, in the presence 
of control constraints. Then (12) is considered in W function spaces, and it is known 
that the max operation, i.e., u 1— > max{0, w}, is so-called Newton or slantly differentiable 
from i7 to for 1 < ? < p < 00, see [8, Theorem 2.6] in view of its Lipschitz continuity. 
In the presence of a norm gap 1 < < p < 00, the generalized derivative, or slanting 
function, can be chosen as an indicator function, see [16, Proposition 4.1]. This allows 
for the interpretation of the generalized Newton method as a so-called active set method. 
This norm gap is made up for in the context of partial differential equation because K 
and K* are solution operators which provide the necessary smoothing. 

It turns out that the behavior of the max and min operations is more intricate than 
in function space. Again, it follows from the Lipschitz continuity of w 1— > max{0,M} 
from & to £^ that slant differentiability holds [8, Theorem 2.6] for 1 < p,q < 00. 
However, we are not aware of any simple slanting function even with norm gap which 
can be algoithmically exploited, see Remark 3.2. It may be surprising that nonetheless, 
the soft-thresholding operator S^j and thus equation (12) are slantly differentiable and 
admit a simple slanting function between any pair of l'^ spaces, see Proposition 3.3. 
This allows us to apply a generalized Newton's method to solve (12), which takes the 
form of an active set method. 
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3.1. Semismoothness of the optimality condition. The concept of slant or 
Newton differentiability is closely related to the notion of semismoothness [8,16,25], 
and we will use the terms interchangeably. 

Definition 3.1. Let X and Y he Banach spaces and D C X be an open subset. A 
mapping T : D is called Newton (or slantly) differentiable in x & D if there exists 
a family of mappings Q : D ^ L{X, Y) such that 



The function Q is called a generalized derivative (or slanting function) for T m. x. 

It is shown in [8] that any Lipschitz continuous function is Newton differentiable. 
However, this is only of little help algorithmically unless there is a generalized deriative 
Q{u) of (12) which is easily invertible. 

Remark 3.2. A natural candidate for a generalized derivative Q of the function 
T{u) = max(0, u) is 

{hk ,Uk>0 
Shk , life = for any 5 eR. 
, life < 

We are going to show that this Q can not serve as a generalized derivative of J-" : ^ i"^ 
for any p G [l,oo[ and 1 < g < oo. We consider a point u E P' for which the set 
{n \ Un ^ 0} is infinite and take a special sequence of h"' e i^, namely 



K 



for k ^ n 
-2uk for k — n. 



Hence, we have \\h^\\p — 2|m„| for n — > oo. It is an easy calculation to see that 

||max{u + /i",0}-max{ii,0}-G'(w + /i")/i"|L 1 , „ 

— — — - for all n with Un 0. 



\\h% 

The following proposition shows that the thresholding operator (2) is Newton 
differentiable and that a function similar to Q serves as a generalized derivative. 

Proposition 3.3. The mapping '■ i'^ ^ i'^ from Definition 2.1 is Newton 
differentiable for any 1 < p < oo, 1 < q < oo. A generalized derivative is given 
by 





for 


\Uk\ 


> Wk 




for 


\Uk\ 


< Wk 
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Proof. Without loss of generality we may assume \\h\\p < ^ and hence \hk\ < Since 
u E £^ with p < oo there exists ko such that \uk\ < ^ ior k > ko- We estimate 



\S^{u + h)-SM-Si^ + h){h)\\l 

oo 

= X] \^vki'^k + hk) - Sy,^^{uk) -Q{u + h){h)k\ 



k=i 



^ \Sw^^{uk + hk) - S^^{uk) -Q{u + h){h)k\ 



k<kQ 



It is easy to check that the above sum is zero for 

Ip < min{||xife| —Wk\ : k < ko and \uk\ Wk} 



because \hk\ < \\h\\p holds. It follows that 



\Sy,{u + h) - Sy,{u) - Q{u + h){h)\\q _ ^ 



for \\h\\p small enough, which proves Newton differentiabihty. □ 

Remark 3.4. In matrix notation we can express the generalized derivative Q{u) as 



g{u) 



I A 





where ^ = {A; G N : \uk\ > Wk}- 

To calculate a generalized derivative for the mapping in (12), we prove a chain 
rule for the generalized derivative. 

Lemma 3.5. Let S : X ^ Y be Newton differentiahle, A E L{X,X) and y E X. 
Let furthermore Q be a generalized derivative of S. Define T{u) = S{Au + y). Then 
H{u) = Q{Au + y)A is a generalized derivative ofT. 

Proof. It holds 

\\T{u + h)- T{u) - H{u + h) h\\ 

m 

_ \\S{Au + Ah + y)- S{Au + y) - GjAu + Ah + y)Ah\\ \\Ah\\ 

\\Ah\\ \\h\\ ■ 

The right hand side converges to zero because ^ is a generalized derivative of S in Au + y 
in the direction Ah, and A is bounded. □ 

In order to specify a generalized derivative of JF, we introduce the active and the 
inactive sets. For the sake of simplicity we will restrict ourself to the case T : ^ 
in the following 
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Definition 3.6. For u e , the active set A{u) and the inactive set 1{u) are given by 

A(u) ^{ke N:\u- ^K*(Ku - f)\k > T^'fe} 
I(u) = {ke n:\u- ^K*(Ku -f)\k< JWk}. 

Whenever the active and inactive sets correspond to an iterate u'\ we will denote them 
by An and In, respectively. We will drop the subscript or the argument if no ambiguity 
can occur. 

We are now in the position to calculate a generalized derivative of !F. 
Proposition 3.7. The mapping T : P' ^ , 

T{u) = u — S^wiu — jK*{Ku — /)) 

is Newton dijferentiable. Denote the active and inactive set A and I as in Definition 3.6 
and split the operator K*K according to 



K*K - 

Then a generalized derivative is given by 



Maa Mm 
Mia Mxi 




h 

Proof. The claim follows from the sum rule for the generaized derivative and from 
Proposition 3.3 and Lemma 3.5 with S = S^^, A = I — jK*K, y = ■jK*f. □ 

Remark 3.8. Note that for any u G P , the active set A is always finite, since 
u — ^K*{Ku - f) e P holds and thus \u - jK*{Ku - f)\k ^ for k ^ oo. 



3.2. Semismooth Newton method. The semismooth or generalized Newton 
method for the solution of (12) can be stated as the iteration 

= (15) 

where ^ is a generalized derivative of J^. We use the generalized derivative Q given 
by (14). Naturally, the semismooth Newton method can be interpreted as an active set 
method, and we state it as Algorithm 1. 

Remark 3.9. (i) Algorithm 1 is the generalized Newton method (15). The unique 

solvability in step 8 is shown in Proposition 3.11 below. 

(a) Given an initial iterate E P, the algorithm is well-defined, and all iterates remain 
in P. We refer again to Proposition 3.11 below. 

(Hi) At the end of step 10, the iterate u^^^ satisfies = 0. Note that r^^ = u^^ holds 
which implies 6ux„ = —Uj^. 

Note that {Hi) implies that all iterates m"^ (n > 1) of Algorithm 1 are finitely 
supported sequences. However, K*{Ku'^ — f) is in general not finitely supported, and 
hence in a practical implentation, this term will be truncated after a number of entries. 
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Algorithm 1 Semismooth Newton method for the solution of (12). 
1: Initialize u^, choose 7 > 0, set n := and done := false 
2: while n < Umax and not done do 
3: Calculate the active and inactive sets: 

^ = {A; e N : - ^K*{Ku^ - f)\k > n} 
X = {A; e N : - ^K*{Ku'' - f)\k < 7«^fe}- 

4: Compute the residual 

= J^(u'') = li" - <S^^K - iK^Ku"" - /)). 



9 

10 
11 
12 



if ||r"|| < e then 

done := true 
else 

Calculate the Newton update by solving 



It / [Sux [r^ 



Update u^^^ := u"" + Su 
Set n := n + 1 
end if 
end while 



3.3. Active set method. One may set up Algorithm 1 equivalently as an active set 
method. This can be seen by a closer analysis of the Newton step (step 8 and 9 in 
Algorithm 1): 



^M2, -M2,Mai\ [7[K*{Ku^-f)]A±WA) 



h ) \ 

4 - M-^{[K*{Ku^ - f)]A ±WA- Maiu^) 



'M2,{K*f±w)\A' 

The sign of w depends of the sign of u"' — ^K*{Ku"' — /). Hence, instead of calculating 
the Newton update in step 8, one may set Uj^^ := and solve A4.aau^^ = {K* f±w)\A- 
This shows that the subsequent iterate ■u""''^ depends on the current iterate -u" solely 
through the active set A. As a consequence, differing values of m" can lead to the same 
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next iterate 

For completeness, we state the active set method as Algorithm 2. Note that the 
algorithm is initialized with an active set A instead of u^. 

Algorithm 2 Active set method for the solution of (12). 



Initialize Aq, Aq , choose 7 > 0, set n :— and done := false 
Set Ao ^ At U Ao , lo ^ \ Ao 
Set the signs of the weights: 




4: while n < rimax and not done do 

5: Set Uj^ — and calculate by solving 

6: Calculate the new active sets: 



A^ = {keN 



[u"-ryK*{Ku^-f)]k<-^Wk} 
\u^-^K*(Ku^-f)\k<^Wk}. 



7: Set the signs of the weights: 



-.n+l 




9 
10 
11 
12 



if s"+i = then 

done := true 
end if 

Set n n + l 
end while 



In this setting, the stopping criterion is coincidence of the active sets in consecutive 
iterations — other choices are also possible. In the numerical examples in Section 4 we 
chose the norm of the residual because a sudden drop of the residual norm occured 
before the minimizer was identified. 

3.4. Local convergence of the semismooth Newton method. The local 
superlinear convergence of the semismooth Newton method (Algorithm 1) hinges upon 
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the uniform boundedness of Q{u'")~^ during the iteration. 

Proposition 3.10. There exists kg and p > such that \\u — u\\2 < p implies that 

A{u) C [l,A;o]. 

Moreover, ko and p depend only on 7, u, and wq. 

Proof. The triangle inequahty imphes 

1^ - ^K*{Ku -f)\k<\u- ^K*(Ku -f)\k+\u-u- ^K*K{u - u)\k. (16) 
The first term can be estimated by 

In - ^K*{Ku - f% < \u\k + 7 \K*Ku\k + 7 I^VU- 

Since u, K*Ku and K*f arc in i"^, the right hand side converges to as — > 00. In 
particular, there exists ko, depending only on the named quantities, such that 

In - jK*{Ku - /) Ifc < 7 Wo/ 2 for all k > kg. (17) 

The second term in (16) can be estimated by 

\u — u — ■jK*K{u — u)\k<: \u — u\k + 7 \K*K{u — u)\k 

< \\u-u\\+-f\\K*K{u-u)\\ < {l + -f\\K*K\\)\\u-u\\. 

Hence there exists p > 0, depending only on the named quantities, such that 

\u-u--fK*K{u-u)\k <-fWo/2 for all A; eN. (18) 

Combining (16)-(18) proves the claim. □ 

At this point, we cannot yet conclude that the active sets remain uniformly bounded 
during the iteration of Algorithm 1, since it is not evident whether the iterates will 
remain in a suitable p-neighborhood of u. 

Proposition 3.11. The generalized derivative Q, given by (I4), is boundedly invertible 
from i"^ into i"^. Moreover, the norm of Q{u)~^ can be estimated by 

ii^«'ii<iiA^^:4ii(^+ii-M^ii)+i, 

where A and T are the active and inactive sets at u, see Definition 3. 6. 
Proof. Let u,r & and consider the equation Q{u) 5u — r, i.e., 

iMaa iMaj\ f 5uA ^ f rA 
Ij J \6uj J yrxj- 
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Necessarily, Suj — rj holds, which implies Sux e It remains to solve 



iMaa Sua = rA- iMja 6ux. 



(19) 



The right hand side is an element of P. Moreover, M.aa is injective. (We rewrite 
Maa = PaK*KPa = {KPaYKPa, where Pa the projection of £^ onto the active set. 
Then M.aau = implies ||A'P4m|P = {u,M.aau) = 0, and hence ua — ^ since K is 
injective.) By Remark 3.8, the active set is finite, and thus M.aa is an injective operator 
on a finite dimensional space, hence it is also surjective. We conclude that (19) has a 
unique solution 5ua G hence Q{u)~^ : £^ — > £^ exists. 
The norm estimate follows from 



7" "-AA 





It 




< ^\\M:^\\\\rA\\ + ||A^:^||||Al^||||ri|| + ||rx| 

< {\\M-a\\\{\ + \\Mai\\) + i) \\r\ 



□ 



Corollary 3.12. Let ko E N and p > be as in Proposition 3.10. Then Q{u) ^ is 
uniformly hounded on Bp{u). 

Proof. Let u E i"^ such that \\u — u\\ < p. By Proposition 3.10, the active set satisfies 
A{u) C [1, ko]. Our plan is to show that ||^(m)"^|| indeed depends only on kQ. Indeed, 
we define 

Note that for every A C [1, ko], 0, AiAA is boundedly invertible, hence C{ko) is the 
maximum of finitely many positive numbers. Moreover. A4ai is obtained from K*K 
by restriction and extension, hence < ||A'*A'|| holds, for all choices of A and X. 

Prom Proposition 3.11, we conclude that 



\\g{u)-'\\<C{ko){^ + \\K*K\\) + l. 



□ 



We may now combine the results above to argue the local superlinear convergence 
of Algorithm 1. 

Theorem 3.13. There exists a radius r G (0,p] such that \\u^ < r implies that all 
iterates of Algorithm 1 satisfy Wu^ — u\\ < r, and u"' superlinearly. 

Proof. By Corollary 3.12, the inverse of the generalized derivative, Q{u)~^, remains 
uniformly bounded in Bp{u). The result is then a standard conclusion for generalized 
Newton methods, see [7, Remark 2.7], or [16, Theorem 1.1]. □ 
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RemEirk 3.14. (i) The neighborhood in which superlinear convergence occurs is 
unknown and may be small. The global convergence behavior of the algorithm 
thus deserves further investigation. The numerical experiments in the following 
section suggest that the choice of 7 is essential in achieving convergence from a 
bad initial guess. For a related problem in Hilbert spaces with a standard Tikhonov 
regularization term global convergence without rates was proved in [22]. 

(a) The proof of Proposition 3.3 together with the chain rule (Lemma 3.5) shows that 
the remainder 

T{u'') - J^{u) - g{u){u'' -u) 

is exactly zero for sufficiently small Hm" — u\\. Hence we expect convergence in one 
step sufficiently close to the solution, which is confirmed by the numerical results 
in the following section. 

Remark 3.15. The assumption on the injectivity of K may be relaxed. The proof of 
Corollary 3.12 shows that we only need that all submatrices Maa for A C [1, ko] are 
invertible. Hence, local superlinear convergence can also be proved when the K satisfies 
the finite basis injectivity (FBI) property [2]. The FBI property states that any submatrix 
of K consisting of a finite number of columns is infective. The FBI property is related to 
the so-called restricted isometry property (RIP), see e.g. [1], which plays an important 
role in the analysis of minimizers ofi^ constrained problems in the theory of compressed 
sensing [5]. 

4 Numerical Results 

In this section we present results of numerical experiments illustrating the performance 
of the semismooth Newton (SSN) method. We implemented the SSN method in 
MATLAB® and made experiments on a desktop PC with an AMD Athlon 64 X2. 
Moreover, we are going to compare the SSN method to other state-of-the-art methods 
for the minimization of constrained problems, namely the GPSR methods [14] and the 
ll_ls toolbox [17] where we used the freely available MATLAB® implementations of 
these methods. The GPSR method is based on gradient projection method with Barzilai- 
Borwein stepsizes and is known to converge r-linearly. The ll_ls method is a truncated 
Newton interior point method which is applied directly to the objective functional (note 
that we apply a Newton method to a reformulated optimality condition). In addition 
we included the widely used iterative soft-thresholding from [10] in our comparison. 
Note that both the GPSR and the ll_ls methods are set up and analyzed in a finite 
dimensional setting while our analysis on the SSN as well as the analysis for the iterative 
soft-thresholding is infinite dimensional. 

4.1. Inverse integration. The problem under consideration is the classical ill-posed 
problem of inverse integration (or differentiation [3, 15, 23]), i.e. the operator K : 



SSN Method for Sparsity Constraints 



14 



L2([0,1])^L2([0,1]) given by 

Ku{t) = I u{s)ds, t e [0,1]. 
Jo 

The data / is given as {f{tk))k=i,...,N with tk — j^k. We discretized the operator K by 
the matrix 



N 



(\ ... o\ 

■-. 
VI V 



The minimization problem reads 

^ AT TV 

,^1^ -^Y.^{Ku),- jff ^Y.'"^\''^\- (20) 

i=l fc=l 

One can check easily that the SSN method is also applicable in finite dimensions and 
hence, this minimization problem can be treated by the SSN method. The discussion of 
the SSN method in infinite dimension provides us with results which are independent 
of the dimension N ^ i.e. the algorithm scales well. 

The true solution u is given by small plateaus and hence the data — Ku + S 
is a noisy function with steep linear ramps. Figure 1 shows our sample data and the 
result of the minimization with the SSN method. Table 1 shows how the SSN method 
performed in this specific example. It can be observed that the residual is not decaying 
monotonically and it descends slowly in the beginning while it drops significantly in the 
last step. Moreover we observed in many examples that the algorithm shows a similar 
performance for a broad range of starting values Another important observation is 
that the performance of the algorithm depends on the value of 7. For too small as well 
as for too large values of 7 the algorithm does only converge when started very close to 
the solution. As a rule of thumb one could take 7 close to the reciprocal of the smallest 
singular value of the (in practice unknown) matrix Ai^A where A is the sparsity pattern 
of the solution. 

We made experiments to see how the SSN method depend on the noise level and the 
regularization parameter. First, we fixed the regularization sequence and changed the 
noise level. Hence, we solved the problem (20) for fixed = 500, fixed Wk = 10^^ and 
varied the noise level 5. Table 2 reports the results. Basically, a higher noise level leads 
to a smaller number of iterations but longer CPU-time (this is, because the active sets 
are larger during the iteration). Second we coupled the noise level and the regularizing 
sequence Wk- Since it is shown in [10, 18] that a parameter choice Wk oc 5 provides a 
regularization we used Wk = S. Hence, we solved the problem (20) for fixed A^ = 500 
and different noise levels S, see Table 3 for the results. Basically, the algorithm behaves 
similar for different noise levels, especially the CPU-time is always comparably small. 

Moreover, we made a simple experiment to assess how the computational cost grow 
with the size of the problem. We considered the inverse intergration problem with 
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Figure 1. Illustration of data and results of the inverse integration problem. Left: 
the true solution u with N = 500, middle: the noisy data / with 5% noise, right: the 
reconstruction by minimization with Wk = S ■ 10~^ and 7 = 5- 10^. The solution 
was obtained with the semismooth Newton method after 11 iterations with a residual 
norm of 1.7 • 10-i°. 



n 



1 
2 
3 
4 
5 



9 
10 
11 



1.3249e+01 
1.0461e+01 
5.3849e+00 
4.8393e+00 
4.2488e+00 
3.0433e+00 
2.8758e+00 
2.8237e+00 
2.7365e+00 
2.5984e+00 
2.5518e+00 



6 . 9764e+05 
2.1698e+02 
2 . 9586e+02 
8 . 3922e+02 
1 . 9864e+02 
1.5474e+02 
3.9127e+01 
3.5658e+01 
2.9485e+01 
7.7932e+00 
1.7423e-10 



Table 1. Illustration of the performance of the SSN method for the inverse integration 
problem. The second column shows the decay of the function value while the third 
column shows the norm of the residual. The data is the same as in Figure 1. 



problem size between 100 and 5000. We kept all parameters, as well as the data and 
the noise level fixed and only refined the discretization of the problem. We stopped the 
algorithms when a required residual tolerance was reached. Moreover, we checked if the 
reached functional value was equal for the different methods since the algorithms used 
different stopping criteria. Table 4 reports CPU times required for the SSN method, 
for GPSR, ll_ls and for the iterative thresholding. In Figure 2 the same data is coded 
graphically. When assuming that the computational cost is 0{N^) we found P — 2.71 
for GPRS, P = 2.70 for ll_ls, /3 = 2.15 for iterative thresholding, and P = 2.20 for SSN. 
Moreover, the constant hidden in the O notation is considerably smaller for the SSN 
method. The observed scaling differs from the results reported in [14] which may be due 
to the different structure of the examples. In [14] the example used a matrix which had 
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#iter CPU-Time (sec) 



1 


Oe+00 


5 


6 


66e-01 


6 


18e-10 


1 


Oe-01 


8 


7 


63e-01 


2 


19e-09 


1 


Oe-02 


12 


3 


98e-01 


7 


85e-10 


1 


Oe-03 


10 


3 


lle-01 


1 


29e-09 


1 


Oe-04 


11 


3 


22e-01 


9 


85e-10 


1 


Oe-05 


11 


3 


18e-01 


2 


49e-09 



Table 2. Behavior of the SSN method for different noise levels with fixed Wk = 10~^. 
The problem under consideration is the inverse integration, the problem size is = 500 
with 7 = 5-10^ throughout. The rightmost column shows the residual norm at 
convergence. 



#iter CPU-Time (sec) 



1 


Oe-01 


9 


1 


25e-01 


6 


94e-12 


1 


Oe-02 


11 


2 


61e-01 


2 


29e-ll 


1 


Oe-03 


11 


2 


96e-01 


1 


47e-10 


1 


Oe-04 


11 


3 


lOe-01 


2 


91e-ll 


1 


Oe-05 


11 


3 


25e-01 


7 


63e-ll 


1 


Oe-06 


12 


3 


50e-01 


4 


16e-ll 



Table 3. Behavior of the SSN method for different noise levels. The problem under 
consideration is the inverse integration, the problem size is N = 500. We chose 
Wk = \\5\\ and 7 = 5- 10^ throughout. The rightmost column shows the residual 
norm at convergence. 



all singular values either close to one or zero, while in our example the singular values 
converge to zero. Hence, it is expected that the empirical scaling of the computational 
costs differs from problem to problem. 

4.2. Deblurring in a Haeir basis. As a second example of an ill-posed problem we 
consider a blurring operator A : L^([0, 1]) L^([0, 1]) given by Au — k * u with the 
kernel k{x) — c{l + x^/X^)'^ with A = 0.01. We choose c such that J kdx — 1 and 
consider u to be extended periodically to IR in order to evaluate the convolution integral. 

In this example we work with a synthesis operator B : i'^ L^([0, 1]) mapping 
coefficients (c^) to a function u = Ckipk where the (ipk) form the orthonormal Haar 
wavelet basis [19]. Hence, the operator under consideration K = AB is a blurring after 
a Haar wavelet synthesis, see [6, 10] for discussions of penalty terms in combination 
with wavelet expansions. 

We start with a given function u which is piecewise constant. The data / is 
computed as f — Au + noise such that we have 25% relative error, i.e. ||/ — 74ii||/||/|| = 
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N 




SSN 




GPRS 




ll_ls 


iterthresh 


iUU 


o 
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Uoe Uz 




. zye Ui 


■1 
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. uie+uu 


2 


04e+01 


1 

LOV 


o 
O 


o / e Uz 


■1 
i 


. ft I e+uu 


■1 
i 


. oye+uu 


4 


03e+01 


ZZ'i 


c 
t3 


oie Uz 


ft 


. zBe+uu 


A 
t 


. ftue+uu 


1 


06e+02 


ooo 


-1 
1 


f ue ui 


-1 
1 


. ibe+ui 


1 


. ize+ui 


2 


43e+02 


501 


2 


83e-01 


2 


.99e+01 


2 


.37e+01 


5 


78e+02 


750 


6 


84e-01 


1 


.36e+02 


7 


.69e+01 


1 


26e+03 


1122 


2 


20e+00 


2 


.65e+02 


2 


.42e+02 


2 


59e+03 


1679 


6 


37e+00 


1 


. 22e+03 


9 


.95e+02 


7 


42e+03 


2512 


1 


87e+01 


3 


.64e+03 


3 


. 88e+03 


1 


95e+04 


5000 


1 


28e+02 


2 


.51e+04 


3 


. 56e+04 


9 


45e+04 



Table 4. Comparison of the CPU time in seconds for the different algorithms 
and different sizes of the problem. The problem under consideration is the inverse 
integration problem with 5% noise and regularization parameter Wfe = 3 • 10~^. 




10^ io' 

Problem size TV 



Figure 2. The empirical growth of the computational cost for the different algorithms. 



0.25. The Haar coefficients of u have been reconstructed by minimizing (1). As an 
illustration of penalties in contrast to classical regularization we also show the 
results of the minimization of 

-\\Kc- + ^Wk\ck\' . 
k=l 

Figure 3 and Tabic 5 show the results of both and the above £^ regularization where 
we discretized the problem to 1024 Haar wavelets. The parameters Wk are independent 
of k and have been tuned by hand to produce optimal results. Since the original data 
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is quite sparse in the Haar wavelet basis, the ^^-reconstruction leads to much better 
results, as expected from the model. It also turned out that the algorithm is robust 
with respect to different initial values u^. We tested several initial values (starting at 
zero, at K*f or at a random position) and the observed convergence behavior was very 
similar in all cases. 

The SSN method converged in six iterations and in 0.3 seconds (for comparison: 
the GPSR method takes 0.5 seconds and ll_ls converged in 5.3 seconds). 

2 

1.5 

1 



0.5 





0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

2| 

1.5 
1 

0.5 

0^ 

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

Figure 3. The results of £^ and i'^ (classical Tikhonov) regularization of deblurring 
in a Haar basis. Upper left: the true solution u, upper right: the given data /, lower 
left: the reconstruction by f.^ minimization with Wk = 0.12 and 7 = 5- 10®, lower right: 
the reconstruction by i'^ minimization with Wk = 0.05. 

4.3. Compressive Sampling. In our last example we illustrate the applicabihty of 
the SSN method to the decoding problem in compressive sampling alias compressed 
sensing (CS). In CS one aims at reconstructing a signal from very few linear 
measurements, see [4, 11] for an introduction to CS. A popular way of decoding a 
signal from data / which was measured by the observation operator K is to minimize a 
functional of type (1), sec [5]. Our example on compressive sampling is taken from [14]. 
We obtain an observation operator K G R^^^ by first filling it with independent 
samples of a standard Gaussian distribution and then orthonormalizing the rows. Hence, 
the operator is not injective but it possesses the so-called restricted isometry property 
(see [1]) which means that all submatrices consisting of a small number of columns have 
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1 3.3920e+001 

2 1 . 3905e+002 

3 1.3326e+001 

4 7.9347e+000 

5 6.0006e+000 

6 5.9823e+000 



2 . 9676e+006 
3.2499e+004 
6 . 2647e+005 
1.7517e+004 
8.0510e-002 
1 . 5424e-009 



Table 5. Illustration of the performance of the SSN method for deblurring in a Haar 
basis. The second column shows the decay of the function value \1/ while the third 
column shows the norm of the residual. The data is the same as in Figure 3. 



singular values close to one. Especially, submatrices made of a small number of columns 
are injective. Hence, the SSN method works as long as the active sets are small enough. 

In this example we chose N — 8192, K — 512, and the signal u contained 64 
randomly placed ±1 spikes. The observation / was generated by / = Ku + noise such 
that we have 5% relative error. The minimization of (1) with w — 0.05 was done with the 
SSN method with parameter 7 = 5- 10^. The SSN method converged in approximately 
1.2 seconds in six iterations and the active sets stayed very small during the iteration, 
see Figure 4 and Table 6. Hence, the SSN method is a promising candidate for the 
decoding problem in CS. 



Figure 4. Illustration of data and results of the CS example. Left: the original signal 
u with n = 8192, right: the reconstruction by £^ minimization with Wk = 0.05 and 
7 = 5- 10''. The solution was obtained with the scmismooth Newton method after 6 
iterations with a residual norm of approximately 1 • 10~^^. 



5 Conclusion 

We have shown that the semismooth Newton method applied to Tikhonov functional 
with sparsity constraints is a fast algorithm which is easy to implement as an active 
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n 














\A\ 


condf A A 


1 


2 


0774e+00 


1 


2254e+04 


252 


44.52 


2 


7 


1752e+00 


2 


2715e+03 


148 


12.77 


3 


2 


7379e+00 


4 


6644e+02 


90 


6.31 


4 


1 


9997e+00 


1 


4674e+02 


67 


4.75 


5 


1 


8386e+00 


3 


9728e+01 


67 


4.60 


6 


1 


8361e+00 


9 


9652e-12 


67 





Table 6. Illustration of the performance of the SSN method for CS. The second 
column shows the decay of the function value \I' while the third column shows the 
norm of the residual. The forth and fifth column show the size of the active set and 
the condition of the matrix which has to be inverted in the Newton step. The 

data is the same as in Figure 4. 



set method. Each iteration involves the solution of a system of linear equations on 
the active coefficients only. Our numerical experiments show that these systems stay 
reasonably small during the iteration and are also very well conditioned. In addition, 
the experiments indicate that the SSN method compares favorably with existing state- 
of-the-art methods when applied to ill-posed problems. While we investigated only 
the local convergence behavior, the numerical experiments indicate that our method is 
robust with respect to the initial value of the iteration. However, the convergence is slow 
as long as the iterates are far from the minimizer and it gets faster when the solution 
is approached. The global convergence properties are not yet explained by our theory 
and need further investigation. Another direction for further research is globalization 
of the method e.g., by the use of an appropriate merit function, and line search or trust 
region methods. 



Appendix 

We define for u E i'^ and h E H 

F(u) = J2^M G(h) ^ -\\h - fWl^ 

k=l 

and calculate their conjugate (polar) functions, see [12, Ch. 1.4]. We have 

oo 

F*{p) = sup {{p,u) - F{u)) = sup {{p,u) - J^Wfe l^fcl) 



e£2 



k=l 



^^P {^(Pk- '^k sign Uk) Uk) 



0, if \pk\ < Wk for all k 



u 1^^^ I oo otherwise. 
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For G, we obtain 

G*{p) = sup ((p, h) - G{h)) = sup ((p, h) -h\h- /liy = + (p, /), 

since the supremum is attained at h — p + f. 
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